The Polygraph Place

Thanks for stopping by our bulletin board.
Please take just a moment to register so you can post your own questions
and reply to topics. It is free and takes only a minute to register. Just click on the register link


  Polygraph Place Bulletin Board
  Professional Issues - Private Forum for Examiners ONLY
  Directed Lie Screening Test (Page 2)

Post New Topic  Post A Reply
profile | register | preferences | faq | search

This topic is 2 pages long:   1  2  next newest topic | next oldest topic
Author Topic:   Directed Lie Screening Test
blalock
Member
posted 05-31-2011 12:49 PM     Click Here to See the Profile for blalock   Click Here to Email blalock     Edit/Delete Message
In terms of accuracy, each of these systems would work about the same. However, you may see differences in inconclusive rates, with ESS performing with less inconclusives. Additionally, using ESS will give decision-makers and administrators more information with which to make a decision (P-values). You will also see a higher inter-rater reliability with ESS. ESS is much easier to utilize, and ESS does not use criteria which does not seem to make a contribution to increased accuracy (i.e. complex EDA response, decreased HR, etc). ESS does not assume a linear relationship with magnitude of response and ratios as traditionally used in the other scoring systems. ESS principles are much easier to defend with the literature. The ESS research is extensive, and is still growing. Cut-scores are not arbitrarily established. There is NO reason not to use ESS, and plenty of reasons TO use it.

------------------
Ben

ben@PolygraphToday.com

IP: Logged

rnelson
Member
posted 06-01-2011 05:50 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Define "best."

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

skar
Member
posted 06-01-2011 08:30 AM     Click Here to See the Profile for skar   Click Here to Email skar     Edit/Delete Message
Thanks.
I like ESS. For the present I use it for single-issue PLC tests.
As I understand for DLST I can use the cut-scores:
SR <= -3 in any spot (p <.05)
NSR >= +1 at all spots (p <.1)
Can I score PLE with ESS and with the same cut-scores and p-value?
Is there overall horizontal total score for SR (p <.05)?

[This message has been edited by skar (edited 06-01-2011).]

IP: Logged

rnelson
Member
posted 06-01-2011 01:56 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
skar,

Those cutscores are correct.

You can also use +2 at all spots for alpha < .05 - we are hearing that some people like the assurance of the more conservative alpha, and +2 is not difficult to achieve. Just keep in mind that the overall accuracy of the polygraph seems to converge at just under or just over 90%, so .05 is well...

Anyway, there are no ESS normative data that includes the pulse oximiter. So I cannot give you any evidence-based recommendation on that. Conservative thing to do, knowing that the PO2 sensor is a very weak component, is don't change anything.

As for your question about horizontal - there is nothing horizontal about the DLST or ESS. Horizontal is a geometric concept. Polygraph scores can be arranged horizontally or vertically - however you like - with no change to the test result. Polygraph scores are actually a 3-dimensional matrix of values (sensor X question X iteration). Horizontal and vertical are therefore silly and confusing terms in this context.

We have tended, in the past, to confound our ability to understand our own work by using terminology that promotes misunderstanding. "Horizontal" is one such term. Another confusing term is "spot" for which we mean "sub-total." Then add to this our obscure "multi-issue" and "multi-facet" which we use to discuss the concept of "independence" - which refers to the notion of whether or not the variance of individual test stimuli are affect by and affect the variance of the other test stimuli.

It is acceptable to combine independent stimuli if the variance of response to the independent stimuli do not differ significantly (statistically significantly). The best thing to do is to actually do a statistical test on the variance. This is what OSS-3 does to aggressively reduce inconclusives.

If you use the Federal 7-position TDA model then you can, and should, use the Federal rules for this technique - which do make use of the grand total - combining the response variance of the independent stimuli. The Federal rules include a requirement for + at all sub-totals - which is a procedural way of ensuring that the variance of all sub-totals belong to the distribution of truthful cases. (and, of course, if the variance of all subtotals belongs to the distribution of truthful scores, then any differences are not significant).

ESS decision rules are the spot-score-rule, which assumes independence, and does not attempt to aggregate the variance of the subtotals. ESS design is to ensure decision accuracy and reduce inconclusives by calculating the normative cutscore using an inverse of the Sidak correction for independent stimuli, to correct for deflation of alpha, when calculating the probability that a person who is deceptive to one or more of the independent test stimuli would produce a truthful score to all of the test stimuli. Inverse Sidak is used to mathematically inflate the alpha, knowing that it will deflate as a result of the testing condition and decision rules.

Why all the obsession over science and statistics?

Well...

all fields of forensic science are increasing subject to expectations to account for themselves.

quote:
http://www.forensicmag.com/news/judge-rules-dui-blood-tests-require-error-rate-report
Judge Rules DUI Blood Tests Require Error Rate Report

May 24, 2011
?SharePrinter Friendly
Email to a Friend
Michigan’s 79th District Court Judge Peter Wadel refused to admit blood-alcohol results in a drunken-driving case because the state crime lab does not report an error rate, or margin of error, along with blood-alcohol results.

East Lansing attorney Mike Nichols, who is handling the case in Mason County said there are no absolutes in science. Not including a range of possible results, Nichols said, ignores the uncertainties in the collection, handling, analysis, and reporting process.

Currently Washington and Michigan are the only states in which judges have made rulings challenging blood-alcohol tests, but this ruling is sure to have impacts across Michigan and the country.

Source: Lansing State Journal


The simple version of all this is

-3 in any = SR (a < .05)
+1 in all = NSR (a < .1)
or
+2 in all = NSR (a < .05)

and...

if it ain't broke then don't fix it.

A lot of very intense work went into this, even though it seems simple and convenient in the end.

.02

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


[This message has been edited by rnelson (edited 06-01-2011).]

IP: Logged

Barry C
Member
posted 06-01-2011 07:26 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Time is short, but here's what I can offer quickly. Pasted below, God-willing, is a comparison of an "Automated ESS" study Ben and I are finishing up.(Well, it's really an extra portion of the study, which we've only recently added). What we have is a set of 56 (lab) cases of a single-issue ZCTs. Half are truthful. Cut scores are +2/-4 and -7 if INC at stage 1. (We didn't do any norming - just tested how well those scores work with and without FP.)

The CIs are computed using a Modified Wilson Score Interval - not the more common Wald method.

What it shows is that Ray's suspicions are correct. Now, cross your fingers...

Photobucket

IP: Logged

rnelson
Member
posted 06-01-2011 09:25 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
That is super interesting.

there is no real difference.

and, your results look darn near just about like the results from other samples.

we have a recently completed study, in which a part of the work was a comparison of ESS and 7 position results.

What was interesting to us was that the 7 position and ESS performed similarly when we controlled for rules and cutscores. And, the correlation was very high when we transformed the 7 position scores to ESS scores.

What that seems to suggest is that the 7 position model and ESS (3 position with weighted EDA) seem to extract about the same diagnostic information from the data - at least in so far as our integer based manual scoring models make use of the data.

We also found a high correlation of automated ESS and manual ESS scores - meaning that eye-ball measurement/scoring can be just about as good as mechanical/automated measurement - if we pay attention to the right valid features. the other thing this means is that any advantage that automated scoring models have over human scorers is not so much due to the features or precision of the measurements but possibly more due to other factors such as reliability, good math, good decision rules, good cutscores etc.

Will you write a paper on these data?

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


[This message has been edited by rnelson (edited 06-01-2011).]

IP: Logged

Barry C
Member
posted 06-01-2011 09:57 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
It's actually the holdout sample on which we cross-validated the PA algorithm with the additional feature of "FPLL." Thus, it's the same data, measured the same way, except we assigned ESS scores. I suspect that since FP adds so little (but every bit counts!), John Kircher is correct when he says we need to trust the significant difference (increase) we see when we conduct the regression analysis. (We saw the same thing with simulations (using the algorithm), but really by narrowing the CIs / increasing precision so we could "see" the difference.)

You see a little better performance here with the FP data, but it's not statistically different from the "standard" channels with a sample size as small as it is. Like I said, it's just an extra talking point for now, but there is some support for using it and sticking with the normed data - at least for now.

In any event, I think we're getting to the point at which our algorithms and (ESS) hand-scores will be very beneficial for the examiner with nobody to look over his or her shoulder, since we can let them "look" at the same (identical) features and just do different versions of "fancy math," I think you call it, and usually come to the same conclusions. What's exciting is that not only is ESS easy and just as diagnostic as what we've been used to, it's easily automated, meaning examiners can easily understand what the computerized "score checkerer" is doing.

Countermeasure data would be interesting....

But yes, it's working its way into the paper.

IP: Logged

skar
Member
posted 06-02-2011 04:39 AM     Click Here to See the Profile for skar   Click Here to Email skar     Edit/Delete Message
Barry C, is this table for both PLC and DLC tests? I have asked about PLE for DLST because the pneumo channel with DLC is often zero (more than with PLC) and PLE could give additional information instead of the pneumo channel.

[This message has been edited by skar (edited 06-02-2011).]

IP: Logged

rnelson
Member
posted 06-02-2011 07:32 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry,

I am confused. Is this Kircher's holdout sample from his PA algorithm?

Who normed the PA algorithm with the PO2?

The replication of the PA algorithm that I made was normed on the OSS sample without the PO2.

Neither the presently available PA algorithm from Kircher nor the replication I did make use of the PO2.

Who has constructed a confirmed case sample with PO2 data?

Actually its not "normed" with the PA - it is a discriminate function - but it's the same for practical discussion. I know you understand, but just so people know.

skar,

who cares if the pneumo data is mostly zero with DLC exams - they work. If we start pushing or manipulating the data just because we don't like the way the numbers look.

also, it is a mistake to think of the PO2 or any of the components as "instead" of or replacing...

each component sensor is different. each one accesses different physiology. each is correlated with deception. the purpose of discriminate analysis and logistic regression are to define a structural model that includes an optimal weighted combination of date from an optimal array of sensors - to maximize the geometric/linear/numerical distance and difference between the truthful and deceptive cases. h

however, it is not just the difference that matters. the variance also matters.

if we start tweaking on the structural model just because we don't like so many zeroes, then we could f^&k-up the model and decrease accuracy.

Just let the thing work.

If you want to tweak on the model, then do what we do - get some data and wind your own propeller around it.

But please don't write anything into your boook, and don't offer any opinions recomendations about field practices until we are really darn sure - from data, and replication, and critical review - about what is best. The polygraph profession is often confused and hurting, unable to make good use of our own scientific knowledge, because of people shedding their opinions all over the place without taking the time to do the really tedious statistical analyses that tell us what is actually best.

.02

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Barry C
Member
posted 06-02-2011 08:29 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Skar,

I could zero out the pneumos and see what happens, but I predict Ray is correct.

Ray,

Now I'm confused. We didn't norm the data as I stated above. All we did is use the raw measurements from the our holdout sample on which we tested the PO2 when we added it to the PA analysis. We just used 1.1 to 1 ratios and let Excel "score" it out of curiosity as to how it might perform.

We did the same analysis that Kircher and Raskin did in 1988 except we added PO2 by measuring excursion (or what we call "line length" in the field). It is there we see the difference. Thus we used a RA to test i the PO2 added to the model (it did), and we did the DA to come up with the coefficients.

I just want to get this done, so norming may be left to others. My plan is to publish the means and SDs, etc, so all the data could be used in an resampling study if someone were so inclined. (I toyed with it a bit, but it really should be a different project.)

IP: Logged

rnelson
Member
posted 06-02-2011 09:21 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
OK, thanks Barry.

My thoughts.

The difference in the data you have shown above is not statistically signficant.

The differences are small enough that people will probably not even notice, in the scores and result, whether they do or do not have the PO2 sensor. (Of course, if we have the sensor we will notice the presence of the snensor because we see it - but that is different.)

The non-significant effects that you have shown are more likely to be observed than any effect (improved crterion accuracy) actually observed by us field examiners who score the data visually (which is what ESS is - visual on-screen, non-mechanical, non-automated, quick-and-dirty, quick and accurate).

Use of a more accurate linear excursion measurement does seem to increase the diagnostic information obtained from the PO2. In the field we don't do excursion. We look visually for a constriction of the pulse-width amplitude. This is an external validity issue that will probably affect whether we can generalize your lab results to the field.

So, if excursion is a superior PO2 measurement paradigm compared to the old visual method, and if the contribution to criterion accuracy from PO2 excursion measurements is not signficant - then it is unlikely that skar or others of us conducting test in the field will be able to achieve any noticable increase in criterion accuracy or reduced inconclusives with or without the PO2.

So, I will argue that we are still at the same place with respect to the PO2. If you have it you might enjoy it from time to time. If you don't have it then you probably will not notice or miss it much.

On the other hand, if we have structural coefficients and decision models that are shown repeatedly to work without PO2 data, then it is actually possible that anxious attempts by field examiners to "add more data" or "replace all those damn zeroes in the pneumo scores" could actually damage and weaken the accuracy of the test.

Its like taking a perfectly good sniper rifle and screwing with the sights right before you go to work. Its a very bad idea. Leave it alone and use it the way it is proven to work. Before we take something to the field we must prove it with more than someone's 'pinion (read: untested hypotheseis).

anyway, if you have a sample of cases with PO2 data, why don't we develop a discriminate function or regression coefficients to determine the optimum weights for OSS-3 scores with the PO2.

as always,

all of this is just my...

.02

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Ted Todd
Member
posted 06-02-2011 09:21 AM     Click Here to See the Profile for Ted Todd     Edit/Delete Message
Barry and Ray.

Confused seems to be a common denominator here. For the record, I would like to point out that I was confused long before either of you!

Ted

IP: Logged

rnelson
Member
posted 06-02-2011 09:26 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
funny ted,

complicated arguments and complicated analysis are necessary to avoid adding un-necessary complications that don't add anything and may detract from our work in the trenches.

is all simple in the end.

bigger is better.

double the EDA increases sensitivity to deception

if you have to squint it is a zero or an artifact

garbage in = garbage out

lather.

rinse.

repeat.


------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Ted Todd
Member
posted 06-02-2011 09:32 AM     Click Here to See the Profile for Ted Todd     Edit/Delete Message
Ray,

Now that....I get!

Ted

IP: Logged

ktaylorCCPD
Member
posted 06-02-2011 10:43 AM     Click Here to See the Profile for ktaylorCCPD   Click Here to Email ktaylorCCPD     Edit/Delete Message
I use ESS. Ray will be presenting on ESS in Sept at APA in Austin..."the stars at night are big and bright" ...sorry got carried away for a second. See everyone in Sept.

IP: Logged

Barry C
Member
posted 06-02-2011 12:04 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I don't know that we can make those conclusions with what we have. The DIA found a 2 to 5% increase in decisions with it, the same as we see with the algorithm. (I don't know if they published it yet, but I have a copy of the report someplace.) That may only be true if the scorer uses RLL and "FPLL" to score. Not sure of traditional methods as we've not tested it. Plus, we don't see significant differences in decisions with the algorithm unless we increase the sample size with simulations. I'll have to see if we get the same results if I remove any other channels. We do with the algorithm (when we look at decisions - not the RA).

Little time for anything today, but I'll see what I can find. I think I've looked already, but I can't remember.

IP: Logged

rnelson
Member
posted 06-02-2011 12:48 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry,

I was not suggesting to are floating your opinion without data. Just reiterating the standard caution.

peace,

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Barry C
Member
posted 06-02-2011 08:10 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Okay, here are the results when I zero the pneumos and use the PO2, followed by when I zero the cardio and use PO2, followed, finally, with what happens when I zero the pneumos and PO2 (i.e., just EDA and cardio are scored):

Photobucket

It scares me a little bit to post this because it can be misunderstood. Just because the test "works" without the pneumo data or the cardio data, etc, doesn't mean we should scrap them. That's not the point. Remember: the sample size is pretty small: 56 cases total, half of which are truthful / half deceptive, so 27 of each. Lose or gain 1 case, and the numbers change, albeit not significantly.

So, if you zero out a lot or all of the pneumos in a DLCQT, don't sweat it - and no need to change the cutscores. I didn't change them here. They are all as described as above.

I hope this works. I'm doing two things at once and so far I haven't gotten caught dividing my attention....

IP: Logged

rnelson
Member
posted 06-03-2011 09:05 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
These are really the questions for regression analysis.

I wish I had more time right now.

Anyway, your results look consistent with other data.

The PO2 adds only a tiny little bit that is not significant is probably would not be noticed by field examiners with or without the sensor.

But, as they say. every little bit helps.

We still have the problem of you are measuring vasomotor excursion which will be more productive than the visual way we actually use PO2 data.

Your mileage may vary - which means to us that we may not benefit this much from having the PO2 in real life.

Although your results are not signficant, they seem to suggest that inclusion of the pneumo seems to increase test sensitivity, and inclusion of the cardio seems to increase test specificity.

Now if you would just include the standar deviantions along with your confidence intervals then I could calculate the standard errors of measurement, and the significance of differences in those mean scores. Maybe you have already done this.

Basically it is too soon to go crazy over a significant improvement over the tried and true combination of Pneumo EDA and Cardio.

And of course, EDA still drives the story for us.

Thanks Barry.

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Barry C
Member
posted 06-04-2011 06:40 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Yes, more time. Need more time.

I have a page in my spreadsheet that calculates a z-test for (two) proportions, but I gave you the standard (Wald) point estimates and the Modified Wilson Score CI just because it was easy to change the former so you'd have a point of comparison with what's generally published. (The M-W Score point estimate is just the mid-point of the upper and lower intervals.) What's also missing are the MCC proportions (that I think Ray introduced me to), and that's where you see a significant difference between the FPLL and no FPLL samples using an automated form of ESS. I didn't test using Wald estimates, but I imagine things wouldn't change much.

I need to check on a couple more things, and time has been tight lately - in part because I've needed extra time make some changes to pass off some of what I do to others and give me more time for what I want. Oh the irony.

In regard to the CIs of proportions, hows and whys, problems, etc, here's one of the best papers I've found on the topic, for those who might be interested:
http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf

IP: Logged

rnelson
Member
posted 06-05-2011 12:51 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry,

I think I'm having an out-of-body experience (feeling like Ted Todd and Jim Sackett) when you babble about Modified Wilson Score CIs and Wald point estimates. I know what your talking about, but it's overkill for my tiny brain-cells.

I think the Matthews Correlation Coefficient is interesting - if we are still looking for the holy-grail of a single numerical index that can capture sensitivity and specificity, and INCS and FN and FP errors. Its' OK. The Long table of numbers is still more interesting, and its easier to just bootstrap the SEM and calculate the CI the old fashioned way or just bootstrap the CI.

Thanks for a little light reading to help us all sleep.

Anyway - a question...

Do you have an exact calculation for the MCC of a sample? And the SEM for the MCC, so we can calculate the CI? (I never took the time to work this out, because I can just bootstrap it.)


r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

Barry C
Member
posted 06-06-2011 10:51 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I'm not sure what you mean with regard to the MCC, and I left it out because I'm not 100% sure of it. Since it's a proportion, the calculations should be workable from the numerator and denominator if those are calculated separately in other cells, but it's late and I'm too tired to think - especially since I'm unsure of the question.

Are you calculating CIs from the bootstrapped samples or just using the upper and lower 2.5th percentiles?

[This message has been edited by Barry C (edited 06-07-2011).]

IP: Logged

rnelson
Member
posted 06-07-2011 08:00 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry,

I typically do both: calculate the CI, and take the percentiles. I report the calculated CI. They are usually very close, except when there is a ceiling or floor (0 or 100) percent. So, the percentiles are a quick and dirty way of evaluating for normal distribution.

r

------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)


IP: Logged

This topic is 2 pages long:   1  2 

All times are PT (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply
Hop to:

Contact Us | The Polygraph Place

Copyright 1999-2008. WordNet Solutions Inc. All Rights Reserved

Powered by: Ultimate Bulletin Board, Version 5.39c
© Infopop Corporation (formerly Madrona Park, Inc.), 1998 - 1999.